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Abstract. Distance measurements are currently the most powerful tool to study the expan¬ 
sion history of the universe without specifying its matter content nor any theory of gravita¬ 
tion. Assuming only an isotropic, homogeneous and flat universe, in this work we introduce 
a mo del-independent method to reconstruct directly the deceleration function via a piecewise 
function. Including a penalty factor, we are able to vary continuously the complexity of the 
deceleration function from a linear case to an arbitrary (n-|- l)-knots spline interpolation. We 
carry out a Monte Carlo (MC) analysis to determine the best penalty factor, evaluating the 
bias-variance trade-off, given the uncertainties of the SDSS-II and SNLS supernova combined 
sample (JLA), compilations of baryon acoustic oscillation (BAO) and H{z) data. The bias- 
variance analysis is done for three hducial models with different features in the deceleration 
curve. We perform the MC analysis generating mock catalogs and computing their best-ht. 
For each hducial model, we test different reconstructions using, in each case, more than 10^ 
catalogs in a total of about 5 x 10^. This investigation proved to be essential in determin¬ 
ing the best reconstruction to study these data. We show that, evaluating a single hducial 
model, the conclusions about the bias-variance ratio are misleading. We determine the re¬ 
construction method in which the bias represents at most 10% of the total uncertainty. In all 
statistical analyses, we ht the coefficients of the deceleration function along with four nuisance 
parameters of the supernova astrophysical model. For the full sample, we also ht Hq and the 
sound horizon rs{zd) at the drag redshift. The bias-variance trade-off analysis shows that, 
apart from the deceleration function, all other estimators are unbiased. Finally, we apply the 
Ensemble Sampler Markov Chain Monte Carlo (ESMCMC) method to explore the posterior 
of the deceleration function up to redshift 1.3 (using only JLA) and 2.3 (JLA+BAO+F7(2)). 
We obtain that the standard cosmological model agrees within 3a level with the reconstructed 
results in the whole studied redshift intervals. Since our method is calibrated to minimize 
the bias, the error bars of the reconstructed functions are a good approximation for the total 
uncertainty. 

Keywords: dark energy theory, supernova type la - standard candles, baryon acoustic oscil¬ 
lations 
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1 Introduction 

Many indications of the accelerated expansion of the nniverse come from distance measnre- 
ments, snch as the distance modnlns of type la snpernovae (SNe la) [1, 2]. In the last two 
decades, several models have been proposed in order to explain this phenomenon and, in 
general, they can be classified into dynamic and kinematic models. Assnming the general 
relativity, the first is described by adding a flnid. Dark Energy (DE), in which several propo¬ 
sitions provide different DE eqnation of state (EoS) (for a review, see [3] and references 
therein). Other common dynamic approach is to modify the geometric setting of the gravita¬ 
tional theory instead of the energy-momentnm tensor, snch as the high-dimensional models 
[4] and f{R) theories [5, 6]. These approaches are labeled as dynamic in the sense that there 
are differential eqnations of motion for the metric, whose modifications consist in altering the 
sonrce term or the eqnation of motion itself. 
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In the context of kinematic models, the expansion history of the universe can be probed 
without assuming any theory of gravitation nor its matter content, and one only needs to 
dehne the space-time metric to study it. Considering the Friedmann-Lemaitre-Robertson- 
Walker (FLRW) metric, the recent expansion of the universe is described in terms of the scale 
factor and its n-order derivatives with respect to time, such as the Hubble, deceleration and 
jerk functions [7-10], as well as in terms of the luminosity distance [11, 12] (and references 
therein).^ 

Since the only unknown in this metric is the scale factor (and possibly the spatial curva¬ 
ture), once one of the kinematic functions is determined, the others can be found by integrating 
and/or differentiating it. Therefore, all kinematic functions are related. Which function will 
be chosen to be reconstructed depends on the questions one wants to answer.^ For example, 
one can model the luminosity distance by a linear piecewise function and obtain a statistically 
sound and unbiased ht using observational data. However, this study will not contribute at all 
to the understanding of the recent accelerated expansion since, in this case, the deceleration 
function is assumed zero in the entire redshift interval.^ 

Keeping the above idea in mind, the reconstruction of a kinematic function has been 
addressed using different methods. To understand some of these different methodologies, we 
divide the problem in two parts: (i) the theory underlying the observable quantities and (ii) 
the relation between the observables and the data, including the data probability distribution. 

Regarding (i), the kinematic model provides a perfect description of the observables, not 
taking into account any noises nor errors in the measurements. For the sake of argument, 
suppose that (ii) is not part of the problem, i.e., the data is perfectly known. In this case, we 
could use a parametric function and adjust its parameters, such that the observable (hope¬ 
fully) matches the data points, or we could use the data points to determine the observable 
function using, for example, interpolation. In this sense, we say that the analysis is model- 
dependent when a given parametric function is chosen a priori and model-independent when 
we use the data to determine it. 

There are also two main procedures to treat (ii). We can assume which is the probability 
distribution of the data and, consequently, the only problem left is to determine the observable 
curve, which can be done in a model-dependent or independent way, as discussed above. In 
statistics texts, this is described as a parametric method. On the other hand, we can follow 
an even more conservative path and not impose a given probability distribution for the data. 
This way, known as non-parametric, also uses the data to reconstruct their own probability 
distribution. 

In the model-dependent parametric approach, one assumes a priori a specihc functional 
form of kinematic quantities, such as the deceleration function q{z), and a probability dis¬ 
tribution of the data [16-19]. A feature of this strategy is that its results have potentially 
smaller error bars when compared to the others. After all, one is introducing a reasonable 
set of assumptions which can lead to biased results. A natural improvement to this is to ap¬ 
ply a model-independent approach, where one tries to reconstruct the curve when still using 
the assumed distribution for the data. Among these approaches is the Principal Component 
Analyses (PCA), in which the kinematic function is described in terms of a set of basis func- 

^There are also works which explore the properties of the DE equation of state (i.e., assuming the general 
relativity) using kinematic quantities, e.g., [13, 14]. 

^For a discussion about the issues in deriving a kinematic function from a reconstructed one of another 
quantity, see [15]. 

^The second derivative of the distance, given by a linear spline, is zero everywhere. 
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tions and the data is used to determine which subset of this basis is better constrained. Then, 
the function is reconstructed by using this subset [13, 17, 20-23]. 

Another possibility is to use smoothing methods [24, 25]. In this model-independent non- 
parametric case, only mild assumptions are made about the data and, usually, no assumption 
is made about the model. This allows a direct translation of the data into a kinematic curve. 
Still in this context, we also have the Gaussian Process (GP), in which one chooses to model 
directly the probability distribution of the kinematic function itself [26, 27]. For a more 
complete list of non-parametric methods see [28] and references therein. 

Recovering both the probability distribution of the data and a reconstructed kinematic 
function require a large amount of data and, in practice, the current observational cosmology 
did not seem to have reached this level yet. This is evinced by the results obtained so far in 
the literature [22, 25, 27, 28]. Regarding the data, there is a good perspective to increasingly 
improve their probabilistic descriptions, since different error sources, such as the systematic 
ones, are being included in their modeling (e.g., [29, 30] ). This presents an additional 
challenge to the non-parametric methods, as they must incorporate all the error sources in 
their reconstruction. 

Even in a model-independent and non-parametric approach, the estimated curves are 
not free from assumptions. Each method has some internal choices of parameters. Gurrently 
in the literature, these parameters are obtained using the observational data. However, as we 
usually have only one set of data, doing so will calibrate the method for this one particular 
realization of the data. In this case, there is no way to know if this calibration provides the 
best balance between bias and variance. This difficulty can be circumvented using different 
realizations of the same data set. For a given calibration, i.e., for a given choice of the internal 
parameters, the method is applied to a large number of simulations obtaining the bias and 
variance for this calibration. Then, repeating this process for different calibrations one can 
hnd the best suited one for the chosen data set. In other words, the internal parameters in 
these reconstructions must not be related to one particular realization of the data, but to 
their probability distribution. 

This idea can be extended to the study of the statistical properties of the data. For 
example, in [28], among other results, the authors apply a bootstrap-like procedure to calibrate 
the smoothing parameter applied to the data. This kind of analysis can provide a insightful 
information about the statistical properties of the data when little is known about their 
relationships. 

In this work, we use the current available observational data for small redshifts (z < 2.3) 
and their likelihoods, namely, the Sloan Digital Sky Survey-II and Supernova Legacy Survey 
3 years (SDSS-II/SNLS3) combined Joint Light-curve Analysis (JLA) SNe la sample [30], 
baryon acoustic oscillation (BAG) data [31-34] and ff(z} measurements [35-38]. Gurrently, 
there is not enough data to perform a full model-independent and non-parametric reconstruc¬ 
tion of the recent evolution of the universe. Therefore, we use the usual likelihood for these 
data, but, to be conservative, we reconstruct q(z} along with some astrophysical parameters 
of SN la, the drag scale (present in the BAG likelihood) and the Hubble parameter Hq. 

Besides the above data, there is also a wealth of data concerning the large scale structure 
connected to the perturbations around a FLRW metric, such as the temperature fluctuations 
of the cosmic microwave background (GMB) [39, 40]. Since we assume no dynamic model, we 
would have to propose a kinematic one for the perturbations. Such model is not feasible as it 
would require a set of functions of both time and space. In principle, one could also directly 
use derived observables, as, e.g., the GMB distance priors [41], to ht the background model. 
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However, these parameters are obtained in the context of a specihc model, e.g., ACDM. Thns 
this model wonld be indirectly reintrodnced in the resnlts. For this reason, we choose not to 
nse these data and focns on the distance-like measnrements. 

We reconstrnct q{z) nsing a model-independent parametric approach, which consists in 
describing q{z) by a cnbic spline. The choice of the deceleration fnnction comes from the fact 
that one can nse snch fnnction to directly test the energy conditions [42, 43]. Likewise, the 
deceleration fnnction is related to the nnderlying dynamics of the metric, since it is a simple 
combination of hrst and second derivatives of the scale factor. Therefore, their knowledge is 
necessary to constrain models which chooses a different dynamic for the gravitation sector 
(Alves et al. to be snbmitted). In addition, the spline method allows ns to vary the complexity 
of the fnnctional form as a fnnction of the knots nnmber, for example. Here we introdnce a 
novel method to continnonsly vary the complexity of the reconstrncted fnnction. As a resnlt 
of this analysis, we also obtain the reconstruction of those SN la parameters and the drag 
scale. 

The paper is organized as follows. In section 2 we review basic concepts, snch as q{z)^ 
and the few assnmptions made here. In section 3, we discnss some approaches nsed in the 
literatnre and some of their respective drawbacks. In section 4 we present onr reconstrnction 
method and the tools to vary the fnnction complexity and to select the best one. Following, 
section 5, we specify the observational data sets nsed in onr stndy and their respective likeli¬ 
hood fnnctions. We then perform a Monte Carlo analysis in different scenarios calibrating the 
method (section 6). Finally, we nse this calibration to reconstrnct the deceleration fnnction 
nsing the observational data, in section 7, and we snmmarize onr conclnsions in section 8. 


2 Deceleration function q{z) 


Measnrements of the CMB [39, 40] show that the nniverse is nearly homogeneons and isotropic 
at large scales. Therefore, we assnme that the nniverse follows the cosmological principle, 
restricting the metric to the FLRW metric. 


ds^ =—(? dt^ + a^{t)\dr^ + S‘l{r){d9‘^ + SIT? 9d(l?)\ , ( 2 . 1 ) 


where Sk{r) = (r, sin(r), sinh(r)) for flat, spherical and hyperbolic spatial section {k = 
0,1, —1), respectively, c is the speed of light and a{t) is the cosmological scale factor. In this 
case, the expansion history of the nniverse can be dehned knowing a{t) and k. 

In practice, we do not measnre a{t) directly, bnt related qnantities snch as the distances 
to astronomical objects. Considering a nnll trajectory of photons emitted by a galaxy traveling 
along the radial direction to ns, we have that 


r 



dt^ 


( 2 . 2 ) 


where te and Iq are the emitted and observed times, respectively. Expanding the scale factor 
to second order aronnd to, gives 
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where oq is the scale factor today and Hq and qo are, respectively, the Hnbble and deceleration 
fnnctions at to, 
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Rewriting the comoving distance Dc and the deceleration fnnction 
1 + z = ao/o, we obtain 


Dc{z) = aor = 


c r dz' 

Hoh W) 


in terms of the redshift, 


(2.5) 


and 


q{z) 


(l + z) dH{z) 
H{z) dz 


( 2 . 6 ) 


whose integral solntion is 

E{z) = exp f (2.7) 

Jo -r z 

and where E{z) is the normalized Hnbble fnnction, E(z) = H{z)/Hq. 

Eqnations (2.5) - (2.7) evince that we shall reconstrnct q{z) in order to access local 
information abont the accelerated/decelerated phase [42]. Besides, assnming that q{z) is 
continnons gnarantees that E{z) and Dc{z) are also continnons fnnctions and at least once 
and twice differentiable, respectively, althongh the opposite is not trne. 


3 Review of other approaches 

In this section we briefly present some of the most nsed methods in reconstrncting the expan¬ 
sion history of the nniverse. We discnss some intrinsic issnes of these methodologies, which 
motivated ns to develop a novel approach (presented in section 4). 

3.1 Parametric models 

In general the Taylor series approaches have two related problems. The first is the convergence 
radins of the Taylor expansion itself, which can only be estimated since the real scale factor 
is natnrally nnknown. On the other hand, as we are fitting the coefficients of snch expansion, 
the fnnctional form for the scale factor (or distance) can be interpreted as a simple polynomial 
interpolation. In this sense, changing the time parameter can be nsefnl [44] and provide a 
better polynomial interpolation. However, this leads ns to the second problem, the Rnnge’s 
phenomenon. That is, after a given order, higher order polynomials provide worse and worse 
approximations. Therefore, when nsing the Taylor expansion one shonld stay on a small 
convergence radins, which wonld restrain the analysis to a very small bnt nnknown redshift 
or nse a polynomial interpolation keeping in mind its caveats. 

More generally, the problem of finding a good kinematic description of the expansion 
history can be addressed nsing a parametric method. In this context, one assnmes a priori 
a specific fnnctional form of a kinematic fnnction, like the polynomial form discnssed above, 
then proceeds by fitting its parameters nsing observational data. For example, in Refs. [16- 
19] they fit different fnnctional forms of the deceleration fnnction q{z). The drawback of this 
method is that the choice of a fnnctional form introdnces a form-bias in the estimates if the 
fnnctional form is different from the trne one. As we do not know it, the resnlt of snch fit 
can be misleading since, even if the parameters’ error bars are small, their form-biases can 
still be large. A more conservative approach is to nse flexible fnnctional forms. However, this 
translates in nsing many parameters and, conseqnently, obtaining larger error bars. In this 
way, there is a natnral trade-off between variance and form-bias which shonld be evalnated 
to determine the optimal reconstrnction. 

An additional, less discnssed, difhcnlty is the estimator-bias. The fnnctional forms are 
nsnally fitted nsing a Least-Sqnares (LS) or Maximnm-Likelihood (ML) approach. As it is 


- 5 - 





well known, both approaches can provide biased estimators for the parameters (ML estimators 
are usually asymptotically unbiased). This means that, even if we knew the correct functional 
form, the fact that we have only a hnite number of observations can lead to estimator-biases.^ 
Heuristically, when htting a functional form with n parameters using m data points we will 
have m/n observations per point. Hence, if the estimators are only asymptotically unbiased, 
then the higher the number of parameters higher the estimator-biases. Thus, the hnal trade¬ 
off must consider variance, form-bias and estimator-bias. 

3.2 Principal Component Analysis 

Another popular methodology is the Principal Component Analysis (PCA) [13, 17, 20-22]. In 
this approach the kinematic function is described by a flexible parametrization (e.g., in [13] 
they express the DE EoS as a constant piecewise function within A = 50 bins). From the es¬ 
timated covariance matrix of these parameters, the eigenvectors and eigenvalues are deduced. 
Say, for example, that we choose the hrst 5 eigenvectors, whose eigenvalues correspond to the 
smallest variance terms. This means that the original division in 50 bins is being described 
by a 5-dimensional parametrization. 

The eigenvectors, whose eigenvalues correspond to the smallest variance terms, provide 
the parametrization which is better constrained by the data. The appeal of this method 
is that it provides a straightforward way to determine the curves better constrained by the 
data. The choice of how many eigenvectors should be used can be answered by looking the 
variance-bias trade-off [13]. On the other hand, the method as described above is subject 
to a potential drawback. The relationship between different kinematic functions are given in 
terms of their integral in time, for example, the relation between the deceleration and the 
Hubble functions is given by Eq. (2.7) while the distance is another integral of the Hubble 
function [Eq. (2.5)]. Note that, to calculate the distance to a given SNIa at redshift Zj, the 
deceleration function is integrated twice between [0,Zi]. Therefore, all bins in this interval 
contribute to the hnal value of the distance. This leads to the following problem, the hrst bins 
contribute to the value of the distance for almost all supernovae, while the hnal bins affect only 
few objects. Besides, since the initial bins always contribute to the value of the distance at 
higher redshifts, they will be naturally correlated to them, any modihcation in their value will 
have to be compensated by the other. These facts have the following consequence, the best 
constrained modes will be strongly connected to the hrst bins while the worst will be related 
mostly to the last bins. Therefore, when one chooses to use only a few (better constrained) 
eigenvectors, the hnal parametrization will provide almost no power in the last bins. This 
is natural since it is equivalent to choose the coefficients of the last eigenvectors to be hxed 
at zero. This problem can be seen in [13], where the issue persists even when the authors 
consider a forecast with 3000 SNe la uniformly distributed in redshift. This problem was 
also noted in [21]. In the latter, they realized that the hrst modes have this dehciency and 
proposed the addition of a new parameter to circumvent this problem. 

Similarly to what is commonly used in the PCA approach, the authors in [46, 47] re¬ 
constructed the EoS function describing it as a set of bins (discontinuous piecewise constant 
function of the scale factor). Instead of limiting the estimated curve variance by using a 
small set of eigenvectors (principal components), they introduced the correlated prior, in an 
approach similar to the GP, which treats the reconstructed curve as a set of random variables 
emerging from a multidimensional Gaussian distribution. This prior introduces a correlation 

^For a detailed discussion of this problem in the context of the cluster number counts see [45] . 
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between the bins controlled by the correlation length and prior strength. Once the prior is 
calibrated, the problem described in the above paragraph can be systematically resolved. 

The discnssion above elncidates an important characteristic of the PCA, it is dependent 
on the initial description of the fnnction. In many works the simple step fnnction (or bins) 
were nsed. If another basis fnnctions were nsed, with different behavior at large redshifts, we 
wonld end np with a similar problem, i.e., the large redshift behavior of the best constrained 
basis fnnction wonld dominate at large redshifts, providing biased estimates at these points. 
We conclnde that, in the specific context of constraining a fnnction by its integral, the PCA 
approach has this potential problem. 


3.3 Smoothing methods 

As we mentioned in section 1, these approaches are model-independent and non-parametric. 
Therefore, they snit the stndy cases where the data probability distribntion is nnknown. 
However, as pointed ont by Montiel et al. [28], these methods find difhcnlties dne to the 
limited amonnt of observational data or even dne to some features of the methods themselves, 
snch as the size of the smoothing parameter and the assnmptions on priors or fidncial models. 
Besides, since no assnmption is made abont the data distribntion, one cannot nse resampling 
to perform a self-validation, bnt only bootstrap like procednres, e.g., jackknife, which nsnally 
reqnires large samples. 

Finally, the fact that this method makes snch minimal assnmptions is not necessarily 
nsefnl. As one can only reconstrnct the direct variable associated to the observable, i.e., cos¬ 
mological distances and H(z), any inference abont the derived kinematic qnantities is limited 
since it is highly dependent on the smoothing techniqne as, e.g., smoothing linear splines has 
always zero second derivative, and top-hat moving average filters are non-continnons. There¬ 
fore, any analysis abont kinematic qnantities, different of the reconstrncted one, reqnires new 
assnmptions [25, 48]. 


3.4 Gaussian Process 

In this approach, instead of modeling a kinematic fnnction, one chooses to model a prob¬ 
ability distribntion for the cnrve as a Ganssian probability distribntion. This assnmption 
dictates the data probability distribntion by relating both the cnrve and observable probabil¬ 
ity distribntions. In this sense, this approach nnifies the two aspects of the model, the data 
distribntion and the cnrve reconstruction. All the assnmptions are comprised in the mean 
cnrve and the two-point covariance, which define the Ganssian distribntion of the cnrve. 

The drawback is similar to the parametric procednre. In general, one has to assnme a 
fnnction to describe the mean of the GP and a two-point fnnction to describe the variance. 
If the considered mean fnnction differs from the trne one, it will impose a bias in the recon- 
strnction. See, for example, references [26, 27], where they assnme a constant mean fnnction. 
Since the GP determines the observable statistical distribntion, it is also necessary to inclnde 
the data distribntion to calcnlate the joint probability distribntion of both cnrve and data. 
In this case, one has a model-independent bnt parametric method in the sense that one is 
assnming a particnlar distribntion for the data. The GP validation also has to be performed 
for a nnmber of realizations of the data, since the indirect cnrve determination throngh a 
Ganssian distribntion can lead to bias in an nnpredictable way. 
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4 Reconstruction of q{z) 


Broadly speaking each strategy described in section 3 has a better suited application. As a rule 
of thumb, to use less hypothesis it is necessary to have more data. Therefore, if the amount 
of data is limited, the generality of GP and soothing methods, for example, are restricted, 
leading to underdetermined problems. Besides that, including natural hypothesis can also 
be considerably difficult in those non-parametric studies, e.g., after the determination of the 
cosmological distances through a smoothing method it is necessary to add new assumptions 
to describe the Hubble function. 

Therefore, in this work we adopted a model-independent parametric approach. Nonethe¬ 
less, to be conservative we reconstruct the kinematic curve along with all the phenomeno¬ 
logical parameters related to the modeling of each data set. Doing so, we minimize the 
assumptions on the data distribution bypassing any bias which could result from it. In a 
model-independent technique we need to use a set of functions to perform the reconstruction. 
To avoid the problems described above on the PCA approach, we use a cubic spline to recon¬ 
struct our observable, as described in section 4.1. One advantage of the cubic spline is that 
it is continuous and twice differentiable on every knot, thus, all the parameters (the value of 
the function on each knot) are related through these conditions.^ 

Another point on the model-independent approach is how to deal with the bias. In the 
PCA approach one uses the particular set of data to determine which basis functions form 
the minimal set. Depending on the data distribution, this tends to favor the intervals where 
the sample is denser, which usually creates biases on the other regions. To avoid this issue, 
we impose a global penalization (see section 4.2) on the curvature of the curve. Applying the 
same penalization in each interval, we evade the localization problem described before. 

4.1 Piecewise deceleration function 

In this work, we avoid making arbitrary choices of the q{z) form, and, consequently, a priori 
restricting it to specific functional forms, by approximating q{z) by a piecewise third-order 
polynomial function, i.e., a cubic spline. 

The first step to build an estimator of q{z) [denoted as q{z)\ is to specify the redshift 
interval (domain D) in which the function is defined. This interval is D = [zmim Zmax]^ where 
Zmin ans Zmax are the minimum and maximum redshifts of the used data. The next step is to 
choose the partition of the domain D into n sub-intervals in which we define q{z) as a cubic 
polynomial function, namely. 



z G [^0,2:1) 
2 G [^1,2:2) 


Pi{z) = 01 ( 2 ; - 2 : 1 )^ -I- hi{z - zi)"^ + ci{z - zi) + di 


q{z) = < 


— 0,n—i(^Z Zn—i) -|- bn—l(^Z Zn—l) Cn—li^Z Zji—i) -|- dri—1 2 G , 


where zq = Zmin, Zn = Zmax and Pi is the cubic polynomial defined in the Tth sub-interval. 
Note that each polynomial Pi{z) in the segment [zi,Zi^i) depends on 4 parameters (oj, 


®In practice, once the value of the function at each knot is defined, the coefficients of each cubic polynomial 
are obtained from the solution of a linear system including not only the knots on Its neighborhood but all 
knots. 
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Ci and di) and, consequently, we would need to estimate 4n parameters to define q{z) in the 
whole domain D. However, imposing the following continuity conditions, 

Pi+l{Zi+l) = Pi{Zi+l), 

Pi+x{Zi+l) = Pi{Zi+l), 

pl+l{Zi+l) = Pi{Zi+l), 

on the n — 1 internal knots i G (l,n — 1), where ' denotes the derivative with respect to z, 
and the two not-a-knot boundary conditions 

%'i^i) = di 'and q^_2{zn-i) = ^n-i{zn-i), (4.1) 

we end up with only n + 1 parameters to determine Q = {do,dn}, i.e., the values of q{z) 
at each knot. As we have a one to one relation between the function in each knot qi = q{zi) 
and the parameter di of each polynomial, from now on we rename the set as Q = {qi} for the 
sake of notational simplicity. Thus, using this cubic spline approximation, we fit the vector 
Q in order to obtain the estimates of the deceleration function (see description in section 6). 

Any interpolation method introduces an error source limiting the set of functions able 
to be reconstructed. The interpolation error for a cubic spline has an upper bound propor¬ 
tional to both the largest distance between nearby knots to the fourth power and the fourth 
derivative of the function with respect to z (for details see [49]). At first sight, larger the 
number of knots, smaller the interpolation error. But in practice, the number of knots is 
limited by the increasing number of parameters. Besides, there is also the overfitting that 
rises when fitting the parameters using data points. As a rule of thumb, one should choose re 
and, consequently, the intervals between knots, such that the estimated function is expected 
to be well approximated by a cubic polynomial in these intervals. One can test the choice of 
re = rei applying the reconstruction for another one, e.g., re = rei -|- 1, and probing the results 
for any significant improvements on the fit. Notwithstanding the interpolation error, another 
important source of uncertainty comes from the statistical errors (bias/overfitting), as we will 
discuss in the next section. Finally, the use of cubic splines represents a large advantage in 
comparison to the step functions frequently used in the literature. For a constant piecewise 
function, the interpolation error is bounded by the first derivative times the largest distance 
between nearby knots. As a result, the number of knots (and, consequently, parameters) 
necessary to reconstruct functions with the same interpolation error bound is much larger in 
a binned approach. 

4.2 Function complexity 

Assuming a cubic spline to approximate q{z), we are able to address both model-dependent 
and model-independent parametric methods by varying the number of knots. The simplest 
case, re = 4, is equivalent to consider that q(z) is a third-order polynomial. On the other 
hand, we approach a model-independent case increasing re. The complexity of the function 
q{z) is, in principle, parameterized by the number of knots, as the number of knots goes to 
infinity any interpolation error drops to zero. Nonetheless, the choice of the domain partition 
is rather arbitrary and, at first, one would have to test different options in order to achieve, 
for example, a “model-independent limit” trying to minimize the over-fitting error. Another 
difficulty inherent of this approach is that the number of knots is a discrete variable and, as 
such, it is difficult to include it as another parameter in the analysis. 
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Instead of varying the number of knots by adding/removing actual knots to the function 
representation, we can fix the number of knots in some large value and penalize independent 
values of the parameters. For example, given that the parameters are just the values of the 
function at each knot, a penalty factor as a increasing function of |% — 9i+i| will correlate all 
parameters g*. The correlation will be proportional to the weight of the penalization in the 
analysis, higher the weight more correlated are the parameters. In the strong correlation limit, 
all parameters would be equal, i.e., qi = qt+i- In this last case, even with a large number of 
knots, the effective number of degrees of freedom would be one. In short, varying the weight 
of the penalization, we can vary the effective number of degrees of freedom, circumventing 
the difficulties described above.® 

In practice, we include a set of penalty factors Pi{(Ti) in our estimator. The initial 
likelihood is L{D,6), where the vector D represents the data set and the vector 6 all the 
parameters, including the spline parameters qi and other parameters as described in sec¬ 
tions 6.1 and 6.2. To obtain the parameter estimators, we add to the likelihood L{D,6) the 
penalization Pi{ai) defining the penalized likelihood 


n—1 


- 2 In 


Lp{D,e) 


= -2 In 


L{D,e)\+Y,P^{<yi), 


(4.2) 


i=2 


where the penalty factor is given by 

2 


PMi) = 


qi - qi 

Oi 


qi = 


{qi-i + qi+i) 


^ahs 


(4.3) 


and we use aahs = 10“®.^ The penalization factor is schematically illustrated in figure 1 
showing the positions of qi and qi. 

We control the complexity of q{z) by varying the value of the relative error CTrei. For 
example, we are able to recover a high complexity function, in particular, a full n + 1 knots 
spline for large Urei, and a straight line in the entire redshift interval when cirei goes to zero. 
The former has many coefficients and can tend to fit the data noise, i.e., it is over-fitting 
dominated. The second naturally sharpen the constraints on {qi}, but they can be biased if 
the assumed functional form significantly differs from the true one. It is worth mentioning 
that this penalty factor allows us to explore a wide range of functional forms, since its simplest 
case is a linear function. Meanwhile, without using the penalty factor, the simplest model 
would be a third-order polynomial. 

Finally, we emphasize that, in principle, one could use a large set of knots while con¬ 
straining the allowed shape with the penalty factor. The restriction will be practical, the 
computational cost increases with the number of knots. Therefore, one should find the best 
balance between computational cost and flexibility of the method. 


4.3 Bias-variance trade-off 

Giving the penalty function which allows us to explore different complexity forms of q{z), 
we now have to introduce a criterion to determine the best cirei value. We want to find the 

®One can also easily interpret, using the Bayesian point of view, the penalty factor as a prior on the fitted 
function. 

^The Gabs factor guarantees that, even if some qi ~ 0 the denominator in the penalty factor does not go 
to zero. 
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Figure 1. A descriptive example of the penalization on a cubic spline. For each three knots, the 
penalization is proportional to the distance between the straight line connecting the first and third 
knots, and the interpolation function. 

scenario where the combined error due to the biases (estimator- and form-bias) and the over- 
htting error is minimized. For this, we decompose the error into bias and variance components 
as described below [50-52]. 

We create a controlled environment introducing a hdicual deceleration function 
which is determined by a given set of values for the parameters 9 . The idea is to use this 

function to generate a new data set D, which is possible since we know the likelihood of 
the data L{D,9). We dehne the ML estimators using the penalized likelihood, then, given a 

simulated sample we obtain the estimates 9{D^^^) computing 



This provides an implicit dehnition of the function In principle, we could 

calculate the bias in the estimator integrating the function i.e., 

(e^ = J dD^^'>Lp . 

However, such integration is computationally unfeasible. It is a Af dimensional integration, 
where N is the number of data points, and the function is usually only determined 
numerically by maximizing the penalized likelihood. 

Instead, we use the Monte Carlo (MC) approach to deal with such integrals.® Since we 

®Note that here we are using the MC method to perform integrals in the data D given a set of parameters 
8^'^. Therefore, we are sampling new data from the given likelihood. This procedure is similar but has different 
applications than the MC used to study the parameter space 6 given a data set D. 
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know the probability distribution of the data, we can create a new realization of the data 
(mock catalog) by resampling,® i.e., using a (pseudo)random number generator to create 
a new data set. Given a large enough number of realizations m, we can approximate the 
expected value of a function of the data as 

- m 

(/(o))-;;;E/(«"')■ M 

1=1 


Using these tools, we introduce the mean squared error (MSE) of approximating a fidu¬ 
cial function g'®^(z) by q{z; firei). For a fixed Urei, it is 


MSE 


arel) 



m L 


(4.5) 


Given the estimate of the expected value. 


(g(z,0-rel)) ^ 


1 

m 


^g('^(z;0-rel), 

1=1 


(4.6) 


we have that (omitting cJrei for simplicity) 

g«(z;arei) - q^Hz)]" = [q^^Hz) - (g(z)) + {q{z)) - q^^z)]" 

= 'f\z)-{q{z))]\[{q{z))-q^^{z)Y 
+ 2\q(^\z)-{qiz))] [{qiz)) - q^-^iz)] , 


and, therefore. 


- m 

MSE c. {{q{z)) - q^^zf + - (g«(z) - (g(z))) 


(4.7) 


(4.8) 


1 = 1 


Since 


1 

m 


m 

(«''>( 2 ) - «( 2 )>) {«( 2 )> - 9 "( 2 )) =. 0 . 


1=1 


(4.9) 


The first term of eq. (4.8) is the squared bias bq{zY and the second is the variance. Since this 
variance estimator is biased, in this work, we evaluate the bias-variance trade-off computing 

1 2 

MSE C. ((«3)) - ,“(2))" + ^ - «(s))) 


1=1 


= bq{z; areif + Var {q{z; Urei)) • 


(4.10) 


Both the squared bias and the variance have the same weight in the above expression. 
Nonetheless, when applying the reconstruction for real data, we usually do not have access to 

®The likelihoods used in this work were all implemented on top of the data description objects of the 
Numerical Cosmology library [53]. These objects automatically provide the resample feature for any likelihood 
using them. 
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an estimate of the bias. Therefore, minimizing MSE can potentially lead ns to a methodology 
with a large bias (as we will see in sections 6.1 and 6.2). To avoid this, in what follows we 
will minimize the MSE satisfying the constraint 


bq{z,arei) ^ a{q{z;arei)) = ^/Var{q{z;are\)), (4.11) 

^rel)) 

where mb controls the maximnm ratio between bias and variance. 

The variance of the reconstrncted cnrve q{z] cirei) can be written in 
ance of the spline parameters Cov (^j, qj). In tnrn this covariance can be 
nnbiased covariance estimator 

Cov {qi, qj) = - i^j)) , (4.12) 

1=1 

where qf'^ is the best-fitting valne of the i-th spline parameter nsing the /-th mock catalog. 

5 Observational data 

As q{z) is not a direct observable, we need to nse other qnantities to access {qi}- In this 
section, we present the samples of type la SNe, BAO and H(z) measnrements, and also their 
respective likelihood fnnctions that we ntilize to recover the deceleration fnnction q{z). 

5.1 Type la supernova data 

We nse the JLA sample [30] of 740 SNe la, whose likelihood is 

- 21n(L5jv/a) = (5.1) 

where the data covariance is a combination of the systematic and statistical errors CsNia = 
^sys “1“ Cgtat /5)5 ^Ild 

Arm = ruBi - (5.2) 

= msi - 51ogio(T>L(2:]'®',2;-“^)) + aXi - jSCi - Mh, + hlogi^ic/H q) - 25. 

mBi is the rest-frame peak B-band magnitnde of the z-th SN la, and zf^^ and z’['^^ are its 
heliocentric and CMB frame redshits, respectively. The SN la astrophysical model contains 
fonr parameters (a,/?, Mi, M 2 ), where the first two are related to the stretch-lnminosity and 
colonr-lnminosity, respectively, and Mi and M 2 are absolnte magnitndes. The Inminosity 
distance [54] is 


terms of the covari- 
estimated nsing the 


= + 


(5.3) 


where the transverse comoving distance in a flat spatial sections nniverse is Dm{z) = Dc{z). 
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5.2 Baryon acoustic oscillation 

The peak position of the angular correlation function of the matter density can be measured 
by the distance ratio Dv{z)/rs{zd) (see [55, 56] and references therein). The volume-averaged- 
distance for perturbations along and orthogonal to the line of sight is defined as [57] 


Dv{z) 


Dm{z) 


, cz 

Wz)\ 


1/3 


(5.4) 


The sound horizon rs{zd) at the drag redshift Zd (i.e., epoch at which baryons were released 
from photons) is 


rd = rs{zd) = 


1 



Csiz 

E{zy 


(5.5) 


where Cs{z) is the sound wave speed in the photon-baryon fluid. 

In this work we use 6 BAO data points as described in table 1 in appendix A. The first 
is measured by Beutler et al. [31] using galaxies from the 6dF Galaxy Survey. Padmanabhan 
et al. [32] reported an improved data obtained with the reconstruction method [58] using 
Luminous Red Galaxy sample from SDSS Data Release 7 (DR7). Kazin et al. [33] give three 
points computing the power spectrum and correlation function of galaxies from the WiggleZ 
Survey in three correlated redshift bins. The last data is obtained by Ross et al. [34] which 
used galaxies from SDSS DR7 with z < 0.2. 

The BAO likelihood is 


— 2 In Lbao 




—1 
-BAO 



2 In Tboss ) 


(5.6) 


where is the observable vector calculated using the theoretical model, i.e., the components 
are given by bf^ = Dv{zi)/vd calculated at each redshift (second column of table 1). The 
vector b represent the observed version of these quantities and its components are provided 
by the third column of table 1. The matrix is the inverse covariance matrix appearing 

in the BAO likelihood (see table 1). Finally, Ross et al. [34] pointed out that their data 
should be used considering their estimate of the likelihood distribution [which we called Lross 
in eq. (5.6)], since, in this case, the Gaussian distribution is not a good approximation. 

References [31] and [32] used the Eisenstein &: Hu [59] (EH98) fitting function to compute 
while [33, 34] used GAMB [60]. As mentioned in Ref. [33], the difference between 
and r*! CAMB order of 3%. So due to the current error magnitude of these data, this 

difference is relevant and, hence, we have to re-scale the data such that all measurements 
refer to the same method. In particular, we multiply Beutler and Padamanabhan’s data by 
^d^EHgs/^dJcAMB = 1-027 and 1.025, respectively. 

The BAO observable depends on the kinematic model through Dv{z) and also requires 
the Td value. However, to calculate rd theoretically, we would need to extend the kinematic 
model to high redshifts and to compute the decoupling redshift Zd- We avoid this making Vd 
a free parameter in the analysis. Therefore, throughout this work we fit rd along with the 
other parameters. 


5.3 Hubble function 

We work with 21 measurements of H{z): 11 are provided by Stern et al. [35] in the redshift 
range 0.1 < z < 1.75 obtained from the spectra of red-enveloped galaxies; Riess et al. [36] 
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estimated the Hubble constant Hq using optical and infrared observations of over 600 Cepheid 
variables; other 8 measurements are comprised within 0.1791 < z < 1.037 as presented in 
Moresco et al. [37]. These last were obtained from the differential spectroscopic evolution 
of early-type galaxies with respect to z. In particular, we use their H{z) values computed 
assuming the BC03 model for the differential evolution. The last datum is an estimate of 
H{z)ri^/{1 + z) at z = 2.3, and it is derived from the transmitted flux fraction in the Lya 
forest of over 48,000 quasars combined with CMB observations as showed by Busca et al. [38]. 

Assuming that H{z) follows a Gaussian distribution and given that the error of each 
measurement is independent, we have that the likelihood is 


20 

— 2 In Ljj = ^ ^ 
^=1 


^2 + 


H{z)ra 

1+z 


_ ^obs 


Z = Z21 


(5.7) 


where and CTj are the data points and their respective errors, H{z) = HqE{z) and E{z) 
is given by eq. (2.7). The observed values of the 20 measures of the Hubble function are 
depicted in table 2 in appendix A. The last observable is described in the footnote of 
the same table. 


6 Methodology and Validation 

The observational data set represent just one realization of their underlying data probability 
distributions. So by construction, we cannot access the bias of an estimated function, e.g., 
q{z), by htting it using this data set. As we discussed in section 4.3, the q{z) bias is inferred 
knowing both the expected value and the true value of q{z) [see eq. (4.10)]. This last is the 
missing piece to compute the bias-variance trade-off. However, one can indirectly infer the bias 
without knowing the underlying q{z)}^ First, one hts the model obtaining the best ht for the 
real data. Then, using this best ht as the hducial model, several mock catalogs are generated 
and, following the description in section 4.3, one calculates the bias and, consequently, the 
bias-variance trade-off. 

The problem with this procedure is that one must choose the parameters of the recon¬ 
struction method to compute the best ht. In particular, we have to hx the number of knots 
and the complexity of the function q{z)^ through the penalization parameter (Trei- Thus, if 
the initial best ht for a given cjrei is already signihcantly biased, so will be the bias-variance 
trade-off analysis. In other words, performing a MC analysis of the bias-variance trade-off 
around the best ht does not take into account the variance of the estimated curve. Therefore, 
a better approach consists in using not only the best ht curve but a set of curves inside some 
statistical signihcance, i.e., curves whose parameters are inside some conhdence interval of 
the best ht. This means that the procedure should be capable of reconstructing not only the 
best ht curve but also every other curves inside some signihcance level. 

There is a high computational cost to study the bias-variance trade-off for a given hducial 
curve. It is necessary to resample from the model m times and to hnd the best ht for each 
realization. The whole calculation must be performed for different values of a^ei until the best 
bias-variance trade-off is attained. 

^°There are methods to infer the bias without the knowledge of the true underlying model, e.g., boot¬ 
strap [61]. In this work we do not explore these approaches, since their application is not straightforward for 
correlated data. 
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In this work, we address the problem by performing the MC analyses for three different 
fiducial curves and seven Crei values (see sections 6.1 and 6.2). The functional forms of 
these fiducial models were purposely defined to have quite different features as shown in 
figure 2. Given these distinct scenarios, we can explore the capability and efficiency of the 
proposed method in reconstructing q{z) and also verify any dependence on the underlying 
model. Essentially, we want to find the drei value which best reconstructs all the three fiducial 
models. 

We carry out this study considering: (i) only the SN la data and (ii) jointly the SN la, 
BAO and H{z) measurements, in sections 6.1 and 6.2, respectively. Therefore, each realization 
is a (pseudo)randomly-generated catalog of SNe la or random samples of all 

three observables, i.e., {{mB,a,Xa,Ca},{{Dv/rd)b],{Hc]], where a = 1,...,740, b = 1,...,6 
and c = 1, ...,21. The methodology and algorithms to generate these samples are described 
in appendix A. 

We use the Monte Carlo object of the Numerical Cosmology Library (NumCosmo) [53], 
called NcmFitMC. This object proceeds as describe in algorithm 1. The code generates a 
minimum of prerun mock samples and the catalog of the best fit for each sample. 

In the list below, we summarize the steps to obtain q{z) using the MC method: 

1. Define the redshift domain D = [zmini Zmax]-, which is determined by the real data 
sample as described in appendix A. 

2. Define the fiducial model where z G D. 

3. Choose a Urei value and the number of knots n, which will be the same for all values 
to be tested. 

4. Run the MC algorithm NcmFitMC for m = prerun minimum realizations. 

5. Finally, compute the MSE. 

These steps are repeated for different values of firei and their respective results are compared 
in order to determine the best scenario which minimizes the MSE [eq. (4.10)] for a given m;, 
[eq. (4.11)]. 

6.1 Monte Carlo analyses: SNe la 

We carry out the first analysis using the SNe la data, such that the realizations are generated 
using the covariance matrix and redshifts of the JLA sample (see appendix A). Thus, D = 
[0.0,1.3] which we divide in 7 equally spaced intervals, i.e., n + 1 = 8 knots.The first and 
second fiducial models, denoted as g^^^(z) and q^'^'^{z), are defined by the blue and green 
curves, respectively, in figure 2. The third one, ^'^^^(z), is the ACDM model (red curve), in 
which the cold dark matter and baryon density parameters are Dc = 0.3 and = 0.05 and the 
DE EoS is rc = —1.0. In all scenarios we assume a flat universe (Dfc = 0).^^ Finally, these three 
fiducial models are completely defined by fixing the SN la astrophysical parameters, namely 
Q-fid _ 0.141, (5^'^ = 3.101, = —19.05 and = —19.12. These values correspond to 

Following the discussion presented in the end of section 4.1, we also performed the analyses using n +1 = 6 
and n + 1 = 10 knots. The results showed a small improvement from 6 to 8 knots, and a negligible variation 
from 8 to 10 knots. 

^^The densities flc and are the energy densities divided by the critical density today. 
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Input: fiducial model. 

Input: Lp{D,9) —)• data set likelihood. 

Input: prerun —minimum number of samples. 

Input: Ire —>■ largest relative error. 

Result: 6^^^ best fit catalog. 

I = 0 
repeat 

Generate catalog from the hducial model 
Find the best ht 6^^^ for 

Update the sample estimates of {6i) and Cov{6i,6j). 

Estimate the mean standard deviation = y^Var(0j)/(/ + 1). 

Calculate the largest relative error on the mean, Ire^^^ = max({(T^g.^/(0j)}). 
Store the best ht 9^^^ and the value of the likelihood —21n(Lp(i3(^), ^)). 

1 = 1 + 1 . 

until I > prerun and Ire^^^ < Ire; 

Algorithm 1: NcmFitMC object implemented in NumCosmo. 



Figure 2. The three q^^{z) hducial models for which we study the reconstruction method via cubic 
spline. 


the best-ht obtained in [30] considering ACDM and SNe la data (including both systematic 
and statistical errors). 

In order to obtain the features of the reconstruction procedure as a function of the penalty 
factor (through cJrei) and we perform the MC analyses considering 7 different (Trei 

values, fjrei = {5%, 15%, 30%, 50%, 75%, 100%, 150%}, for each hducial model. Independently 
of fXrei and we standardize our analyses hxing the number of realizations to m = 42000, 

with which we obtain small Ire < 1% (algorithm 1) in ah cases. Then, for each mock catalog. 


the reference, the authors use a different parametrization (Mb, Aa/) such that Mi = Mb and M2 = 
Mb + Am- 
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Figure 3. The top part of each panel shows the reconstructed curve of q{z) using 8 knots, MC 
approach and sampling from SNe la data. The colored (blue, red and green) lines and shaded regions 
are the mean function {q{z)) and their Ict error bar, respectively, obtained for a given cTrei and 
The black lines correspond to (upper panel), q^^^ (middle) and ACDM (lower). The bottom part 
of each panel shows the bias (dashed lines) and its Icr error bar of the mean curve. 
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we minimize the function 


n—1 

- 21n(LsNIa,p) = -21n(LsNIa) + ^-Pi(<7rel), (6.1) 

i=2 

with respect to the free parameters 

0 = {go, gi, g2, gs, g 4 , %, %, gz, /?, Mi, M2}, (6.2) 

where the right-side terms of eq. (6.1) are given in eqs. (5.1) and (4.3), respectively. Therefore, 
the expected values {q{z;are\)) and the error bars a{q{z-, arei)) = i/\ai{q{z-, (Trei)) of the 
reconstructed function g(z; cirei) are estimated by computing eqs. (4.6) and (4.12). 

Figure 3 displays the reconstructed curves (colored solid lines) for cirei = 5%, 50% and 
150%, along with their respective fiducial models (black lines). As we discussed in section 4.2, 
a small (Trei implies in a big constraint on the function complexity. Indeed, we see that 
iTrei = 5% imposes q{z', 5%) to be a linear function independently of the underlying fiducial 
model (upper-left part of the three panels). 

Naturally, if the true model differs from a linear function, the result will be strongly 
biased and, consequently, the estimated curve will not be capable to recover the true form. 
In particular, q(z; 5%) is outside the Icr error bar in a large fraction of the redshift interval 
for q^‘^^{z) and q^'^‘^{z). The bottom-left part of the three panels in figure 3 show the bias 
(colored dashed lines) as a function of the redshift and its la error bar. 

Overall, until 2 ; < 0.6, the bias b(z; (Trei) decreases as the q(z; Urei) function complexity 
(i.e., (Trei) increases, such that the reconstructed function approximates better and better the 
fiducial curve. On the other hand, due to the small amount of data at higher redshifts, mainly 
for 2 : > 1.0, the standard deviation a(q(z; a^ei)} greatly increases, as evinced in figure 3 by 
the colored shaded areas. As a direct result, we have that the constraints on the highest 
parameters g* are degenerated causing an increment on b(z; 150%), for z > 0.6, in comparison 
to b{z', 50%) for all three fiducial models, as shown in figure 3. 

In order to define which cTrei provides the best reconstructed curve, we compute the 
MSE [eq. (4.10)] as a function of z of all 21 reconstructed curves. Figure 4 shows the MSE 
(solid lines) for each (Trei and g'®‘^^(z) (upper panel), g^'^^(z) (middle) and ACDM (lower) 
models. For these three fiducial models, g(z; 5%) is the function with the smallest MSE for 
any nib- However, these are also the most biased reconstructed curves and, even through 
visual inspection (figure 3), they do not give satisfactory reconstructions to their respective 
true g^'^(z). Nevertheless, if we have only analyzed the fiducial curve closer to the best fit 
one (fiducial 3), we would wrongly conclude that cjrei = 5% provides the smallest MSE with 
a insignificant bias (see the last panel in figure 4). 

More importantly, when performing the analysis with real data, we will be able to 
estimate the error bars but not the bias. We can note in figure 4 that the MSE for (Trei = 5% 
has about the same contribution from bias and variance. Thus, if we choose the smallest 
MSE (cJrei = 5%) disregarding mb, the estimated error in the real data analysis would provide 
only half of the total uncertainty in the reconstruction. In a conservative approach one 
would estimate the variance and then double by hand the error bars.^^ Notwithstanding, 
inspecting the a^ei = 5% reconstructions in figure 3, we note that this reconstruction looses 
all information about the shape of the curve. So even correcting the error bar for g(z) by 

a more careful analysis, one could calculate the bias for several fiducial models and their respective 
upper limits. Then, this value could be added to the curve obtained from the real data. 
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Figure 4. The top part of each panel shows the MSE of the reconstructed q{z) for 7 different Crei 
values G [5%, 150%], and obtained using 8 knots, MC approach and sampling from SNe la data. The 
upper, middle and lower panels refer to and ACDM models, respectively. The MSE 

decomposition into variance (dotted lines) and squared bias (dashed lines) is displayed in the bottom 
part of each panel. 
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doubling the computed error, for example, the form of the curve will always be close to a 
straight line. 

The objective of this work is to reconstruct the form of the kinematic curve. We select 
the best reconstructed function (smallest MSE) requiring the bias to be at most 10% of the 
total error, i.e., nib = 0.1. Making this imposition, the estimated variance for the fit using real 
data will provide a good approximation of the uncertainty of the reconstruction. Therefore, 
taking into account the three fiducial models, we find that the best bias-variance trade-off as 
a function of z is achieved for Urei = 30%. It is worth noting that the MC results for a, /3, 
Ml and M 2 are unbiased (bias smaller than 0 . 1 %) for all 21 cases that we studied. 

6.2 Monte Carlo analyses: SNe la + BAO + H{z) 

In this section, we perform the MC analyses combining the SNe la, H(z) and BAO data. 
In this case, we equally divide the redshift interval, D = [0.0, 2.3], using n -|- 1 = 12 knots. 
The fiducial models and parameters are those defined in section 6.1 and, additionally, = 
103.5/i“^Mpc and = 73.0kms“^Mpc“^. Similarly, the MC study is done for the same 
iTrei set and the three fiducial models. In view of the increased amount of data, the present 
q{z) reconstruction is better constrained and m = 20000 mock catalogs (for each MC analysis) 
are sufficient to provide (^(z; cJi-ei)) with small Ire < 1% (see algorithm 1). 

Since the SN la, BAO and H{z) data sets are independent, the joint likelihood is 

n—1 

- 21n(LsBH,p) = -2 (InLsNia + IuLbao + InL//) -|- ^ B^cTrei), (6.3) 

i=2 

where —2lnLsNla, —21nLBAO and — 21 nL/^( 2 ) are given by equations (5.1), (5.6) and (5.7), 
respectively. Then, for each realization, we compute the best-fitting values of the following 
18 parameters, 

G = {qo, qi, q 2 , qs, Qi, 95 , 96 , 97 , 98 , 99 , 9io, 9ii, «, /?, Mi, M 2 ,Ho,rd}, (6.4) 

and, with them, we calculate the expected value of each parameter estimator, (8), and their 
covariance matrix. 

Analogously to the results presented in section 6.1, in figure 5 we show the reconstructed 
curves, ( 9 ( 2 ; CTrei)) (colored solid lines), the biases (dashed lines) and their respective la error 
bars (shaded areas) for Urei = 5%, 50% and 150% and the three fiducial models. As expected, 
combining SN la, BAO and H{z) data decreases a{q{z', (Trei)) and we note, in the three upper- 
left panels of figure 5, that these improved constraints no longer restrict {q(z;5%)) to be a 
linear function.Besides, as displayed in figure 6 , it also leads to a stronger dependency of 
b{z', fJrei) on the fiducial model. 

For example, the squared bias function of q^'^‘^{z) (dashed lines on the middle panel of 
figure 6 ) presents a large variation with respect to iTrel, similar to what we obtained considering 
only SN la data. On the other hand, the bias function of the ACDM fiducial model is nearly 
invariant with respect to Urei (lower panel of figure 6 ). This highlights that, giving a sufficient 
amount of data, if the true model has low complexity form then, naturally, a satisfactory 
reconstruction will rapidly be reached even for small cJrei- In particular, taking into account 
the bias-variance trade-off and mb = 0.1, the best reconstructed curve for the ACDM fiducial 
model is obtained for cJrei = 15%. 

^®This is natural since, in this case, we have more knots and we would need a smaller a^ei value to constrain 
into a linear curve. 
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Figure 5. The top part of each panel shows the reconstructed curve of q{z) using 12 knots, MC 
approach and sampling from SNe la, BAO and H{z) data. The colored (blue, red and green) lines 
and shadow regions are the mean function {q{z)) and their Icr error bar, respectively, obtained for a 
given CTrei and The black lines correspond to (upper panel), (middle) and ACDM 

(lower). The bottom part of each panel shows the bias (dashed lines) and its Icr error bar of the mean 
curve. 
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Figure 6. The top part of each panel shows the MSE of the reconstructed q{z) for 7 different drei 
values S [5%, 150%], and obtained using 12 knots, MC approach and sampling from SNe la, BAO 
and H{z) data. The upper, middle and lower panels refer to q^^^{z), q^‘^^{z) and ACDM models, 
respectively. The MSE decomposition into variance (dotted lines) and squared bias (dashed lines) is 
displayed in the bottom part of each panel. 
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As it was shown in fignres 4 and 6, the MSE is dominated by the variance of the fitted 
parameters and, conseqnently, by the measnrement errors of the observable data sets. In 
this way, we note that the MSE is independent of the fidncial model, depending only on the 
penalty factor, i.e., drei- This aspect points ont the fact that an statistical inference with 
relative small error bars can be completely misleading. For example, the well-constrained 
fnnction q{z; 5%), obtained for is not at all a good reconstrnction of the trne model, 

diverging with more than Icr in a large fraction of the redshift interval. Ultimately, following 
the discnssion on section 3.1, the natnre of the biases for = 5% and 150% can be classified 
as form-bias and estimator-bias, respectively. 

As in section 6.1, we expect that a good reconstrnction is one which does not present 
a significant bias. Therefore, applying the same reqnirement nib = 0.1, we obtain that 
iTrei = 30% provides the best balance between bias and variance, given the cnrrent data sets, 
to reconstrnct the q{z) cnrve. 

In snmmary, the MC ontcomes (sections 6.1 and 6.2) show that onr method is efficient 
in reconstructing q(z) and, given the cnrrent errors of the observational data, cJrel = 30% is a 
safe and conservative choice to recover q{z) imposing minimal assnmptions and gnaranteeing 
that the reconstrncted cnrve will not be bias dominated. 

7 Results 

Defined the best estimator, g(z;30%), we now obtain the deceleration fnnction given the 
real JLA, BAO and H{z) samples (see section 5). As before, we reconstrnct q{z) (i) in the 
redshift interval D = [0,1.3] nsing JLA SN la data and (ii) in D = [0,2.3] combining those 
three observable data. 

The likelihood fnnction is now interpreted as the posterior distribntion 

V{e\D) = Lp{D\9), 

since we are assnming flat priors on all parameters. We then nse the NnmCosmo algorithm 
NcmFitESMCMC,^® which implements an ensemble sampler with affine invariance for Markov 
Chain Monte Carlo analysis, to compnte the mean fnnction q{z) given the JLA sample and its 
68.27%, 95.45% and 99.73% confidence intervals (Cl). We ran 100 chains, compnting 8 x 10^ 
points in the 12-dimensional parametric space, shown in eq. (6.2), attaining a mnltivariate 
potential scale redaction factor (MPSRF) eqnal to 1.016.^^ The best-fitting valnes and the 
error bars of the SNe la parameters are a = 0.141 ± 0.007, /3 = 3.108 ± 0.081, Mi = 
-19.05 ± 0.03 and Ma = -19.12 ± 0.03. 

The mean (bine line) and Cl’s (bine shaded areas) are displayed in the left panel of 
fignre 7, where we note that the ESMCMC 68.27% Cl is in agreement to the respective MC 
error bar. Natnrally, as in the MC stndy, the difficnlty in constraining q[z) with minimal 
assnmptions is that the resnlt is highly degenerated. So, besides the fact that q{z) < 0 in the 
entire redshift, this is not statistically significant. The main resnlts of this q{z) reconstrnction, 
nsing only SNe la data, are: the indication of a transition redshift at zp — 0.53 with 68.27% 
significance level, and the evidence of an accelerated expansion phase > 99.73% within the 
redshift interval ~ [0.04,0.19]. 

^®The algorithm is describe in [62] and was implemented in C in the NumCosmo library [53]. Another 
unrelated implementation in Python is described in [63]. 

^^The MPSRF is a diagnose tool used to check the convergence of a Markov Chain, which was originally 
proposed in [64]. A value <1.2 indicates the convergence. 
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Figure 7. The model-independent reconstructed q{z) (blue solid line) and its 68.23%, 95.45% and 
99.73% confidence intervals (blue shaded areas), for cTrei = 30%, using SNe la (left panel) and SNe la 
+ BAO + H{z) data (right panel). The red line and contours are the q{z) mean and its Cl’s obtained, 
assuming XCDM model, fitting Vic (with w = —1) (left panel) and {Hq,VIc,w) (right panel) along 
with the SNe la nuisance parameters. The black lines correspond to the Planck+BAO+JLA+TJo 
best-fit assuming ACDM model. 


To compare this result with the flat ACDM model, i.e., assuming GR and DE EoS given 
by re = —1.0, we carry out the ESMCMC in the 5-dimensional parametric space 

0 = (D„a,/3,Mi,M2). 

For this, we fixed the other cosmological parameters to the JLA best-fit. We obtain the 
following best-fitting values and standard deviations: Dc = 0.244 ± 0.034, a = 0.141 ± 0.007, 
/3 = 3.103 ±0.081, Ml = -19.05 ±0.023 and M 2 = -19.12 ± 0.026. The mean (red 

line) and the Cl’s (red shaded areas) are consistent with our model-independent reconstruction 
overall redshift range, as shown in the left panel of figure 7. In both analyses, the SN la 
nuisance parameters a and /? are weakly (anti-) correlated (< 0.1) to {%} {i = 0,...,7) and 
Vic- Ml and M 2 are also weakly correlated to most parameters excepted for qq and Vic, in 
which there are moderate correlations ~ 0.56 — 0.66. At last, we also plot the Planck best- 
fitting curve (black line), ^^'’^“^( 2 ;), which we computed using the Planck±BAO±JLA±Mo 
best-fitting parameters obtained assuming ACDM [40], namely, ffg = 67.74, Vic = 0.259 and 
Dfc = 0.049. We note that is inside the 68.27% Cl in the entire redshift interval. 

We follow the same procedure to compute q(z) and their Cl in the redshift interval 
[0.0, 2.3] given the JLA, BAO and H{z) data. In this case, 8 x 10^ points are calculated 
in the 18-dimensional parametric space [eq. (6.4)], getting MPSRF = 1.033. In this case, 
the best-fitting and standard deviation values of the non-gj parameters are Hq = 71.68 ± 
1.69kms-^Mpc"S = 101.15 ± 1.8/i"^Mpc, a = 0.141 ± 0.007, P = 3.111 ±0.081, Mi = 
— 19.01 ± 0.05 and M 2 = —19.07 ± 0.05. We note in the right panel of figure 7 an expressive 
improvement on the constraints due to the combined data, regarding that we are assuming 
only FLRW metric and flat space, and given the high dimension of the parametric space. In 
this case, we obtain zt — 0.58 with 68.27% Cl and we can attest with significance > 99.73% 
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that the universe is accelerating within 0.02 ^ z < 0.22. At last, we also have an indication 
of a decelerated phase in the redshift interval 1.84 < ^ < 2.13 with 68.27%. 

We compare the model-independent q{z) reconstruction with the XCDM model, i.e., 
constant EoS for the dark energy component w = constant, performing the ESMCMC on the 
7-dimensional parametric space 


9 = {Qc,Ho,w,a,l3,Mi,M2). 

Their best-fitting values and error bars are 11c = 0.269±0.017, Hq = 71.04±1.59kms“^Mpc“^, 
w = -1.11 ± 0.07, a = 0.142 ± 0.007, /3 = 3.108 ± 0.081, Mi = -19.03 ± 0.050 and 
M 2 = —19.10 ± 0.05. In figure 7 (right panel), we plotted the results, (red line) 

and the Cl’s (red shaded areas), showing that XCDM model is also compatible within 99.73% 
confidence level with the reconstructed curve in the entire redshift interval. It is worth noting 
that our conservative analyses provide only lower bounds for zt — 0.19 (JLA) and ~ 0.22 
[JLA+BAO+77(z)] within 99.73%. On the other hand, model-dependent works can estimate 
better constrained intervals for zt [65, 66], as evinced by the XCDM results. However, as we 
discussed in sections 1 and 3, these estimates are accurate as long as the form-bias is small. 

Besides the results presented so far, we can also infer other kinematic quantities from 
the reconstructed function q(z) (see sections 1 and 2), such as H{z) and the jerk function 
j{z) = —'a/{aH^) (i.e., the third order term of the scale factor expansion). The functions 
H{z) and j{z) are estimated by integrating and differentiating, respectively, q{z) with respect 
to z. Figure 8 shows the results obtained using only JLA^® (left panels) and JLA+BAO+77(2:) 
(right panels). The upper panels display the means H{z) (blue lines) and the 68.23%, 95.45% 
and 99.73% Cl’s (blue shaded areas). 

As the H{z) estimate is related with q{z) by a numerical integration, it is consequently 
better constrained and the enhancement due to the combined data is even more apparent, 
as shown figure 8 (upper right panel), where the 21 H{z) measurements are also plotted. 
As before, we compare these results with the respective outcomes assuming, respectively, 
ACDM and XCDM models (red lines and shaded areas). The decrease in the Cl’s of the 
reconstructed H(z) highlights its concordance with those cosmological models within 99.73% 
level. The Planck best-fit (z) (black lines) also lies inside the Cl’s over the entire 

redshift interval for JLA+BAO+H(z), and it is outside the 99.73% Cl only in 0.0 ^ z < 0.03 
when we use JLA. 

Contrarily to the above scenario, j{z) is obtained by numerical differentiation implying in 
high degenerated results, as displayed in the lower panels of figure 8. The means j{z) and the 
Cl’s are represented by the blue lines and blue shaded areas, respectively. The estimated jerk 
functions, obtained assuming ACDM (left panel) and XCDM models (right panel), correspond 
to the red lines and red shaded areas. Besides the redshift intervals [0.14,0.39] (left panel) 
and [0.13,0.45] (right panel), which provides —6 < j{z) < 6 with > 99.73% significance, the 
degenerated results do not provide relevant information about this kinematic function. 

At last, in this work we also obtain a model-independent estimate of the sound hori¬ 
zon, since is fitted throughout the reconstruction procedure. Thus, we obtain = 
101.15 ± 1.8/i“^Mpc given the combined analysis in the 18-dimensional parametric space 
using JLA+BAO+77(z) data. Our result is in accordance with that presented in refer¬ 
ence [67], where they obtained r^i = 100.7 ± 2.0/i“^Mpc (SNe+BAO+77(z) without Hubble 

^®In this case, we recover H{z) [see eq. (2.7)] considering Hq = 70.0kms“^Mpc“^, which is the reference 
value used in [30]. 
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Figure 8. H{z) (upper panels) and j{z) (lower panels) mean estimates (blue solid lines) and their 
68.23%, 95.45% and 99.73% confidence intervals (blue shaded areas) obtained integrating and deriving, 
respectively, the reconstructed q{z) using SNe la (left panels) and SNe la + BAO + H{z) data (right 
panels). Similarly, the red line and contours are the H{z) and j{z) means and their Cl’s obtained 
integrating/deriving the respective q{z) where the ACDM (left panels) and XCDM (right panels) 
models were assumed. The black lines correspond to the Planck+BAO+JLA+77o best-fit assuming 
ACDM model. The black dots and error bars represent the 21 H{z) measurements used in this work. 


prior) and = 101.9 ± 1.9/i“^Mpc (idem with Hubble prior) assuming a linear spline for 
h~^{z) = 100kms“^Mpc“^/f7(2:), 6 knots equally-spaced and curved universe. It is worth 
noting that their reconstruction was directly applied to the data and no test regarding the 
assumed curve was performed. 


8 Conclusions 

In this work we presented a general model-independent approach to reconstruct any one- 
variable function. In particular, we applied it to estimate the deceleration function q{z) using 
the JLA SN la, BAO and H{z) data sets. We performed a conservative analysis in the sense 
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that we reconstructed q{z) considering minimal assumptions (FLRW metric and flat universe) 
and also fitting simultaneously Hq, and the SN la astrophysical parameters (a, 13, Mi, M 2 ). 

The MC results presented in sections 6.1 and 6.2 show that it is crucial to estimate both 
the form- and the estimator-biases in order to validate any proposed reconstruction method, 
independently if it is parametric or non-parametric. They also reveal that the MSB’s are 
independent of the underlying fiducial model and are dominated by the variance of the current 
data sets, even for cases where the reconstructed functions are far to be a good representation 
of the true curves, i.e., highly biased. Nevertheless, a blindly minimization of the MSB could 
lead to a bias dominated reconstruction. Such reconstructions must be carefully analyzed and 
the bias added by hand or estimated in some way. Bven so, the information about the shape 
of the curve is mostly lost. Thus, requiring the bias to be at most 10% of the total MSB, 
we evaluated the bias-variance trade-off obtaining that, currently, the best penalty factor is 
given by Crei = 30%. 

Bven though the MSB is approximately independent of the fiducial model, analyzing 
MSB of only one model can lead to misleading conclusions. If we had only used the ACDM 
fiducial model, the smaller firei = 5% would prove to be the best choice, i.e., smaller MSB 
and insignificant bias. In other words, if a method is good in reconstructing a ACDM shaped 
curve, this does not mean that the same method will be able to reconstruct other curves 
close to it. In addition, capping the bias at 10% is a straightforward way to obtain the 
total uncertainty of the estimates, since in this case we can use directly the variance analysis 
without requiring additional bias estimation. 

Our main reconstructions are summarized in figure 7, given that there are no assumptions 
on the gravitational dynamics nor on the matter content, and we also include the relevant 
astrophysical parameters into the study. The standard cosmological model agrees with the 
reconstructed results within 99.73% Cl in the entire redshift intervals. This is even further 
evident observing the estimate for the Hubble function in figure 8. In this sense this work 
discards large deviations from the standard model with the caveat that we are assuming 
the FLRW metric and we are not testing this hypothesis. Notwithstanding, with more data 
available, we advocate that to measure departures from the standard model, one should repeat 
the process described in this work, evaluating the capability of the model to differentiate 
between a whole class of fiducial models. 
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A Simulated data 

In this appendix we describe the methodology to generate the SN la, BAO and H{z) mock 
catalogs which we used to perform the MC study in section 6. 
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A.l SN la sampling 

The SN la distance modulus ^ is written in terms of its light-curve parameters {mB,X,C) as 

= ruB - {Mh. - aX + pC), (A.l) 


where mB the observed peak magnitude in rest-frame B, X and C are the observed stretch¬ 
ing and the color at maximum brightness, respectively, and a, [3 and M^,- = {Mi, M 2 } 
are nuisance parameters [30]. Thus, in order to generate SNe la samples, we must pro¬ 
vide fiducial values of {niB, X,C), for each SN la, and their respective probability distri¬ 
butions. The fiducial values are usually determined choosing a specific set of the observ¬ 
able model parameters. In this case, from eq. (A.l) and the theoretical distance modulus, 
= 5 logig(PL(z*'®*, z^'^^)Hq/c) + 25, we can write the fiducial magnitude as function of Xi 
and Ci, 

= 5 logio [T^l 9 “)] - + M^f - 5 logio(c/Mfi^) + 25, (A.2) 

where i denotes the SN la index and the SN la redshift values are always kept 

fixed to their observable estimates. As described in section 6.1, the fiducial parameters are 


{a^^, Mf'^, Mf^, = {0.141,3.101, -19.05, -19.12, 73.0}, 

and they correspond (besides Hq'^) to the best-fitting values assuming ACDM model obtained 
in reference [30]. We consider the three different ^^^( 2 ;) represented in figure 2. At this point, 
we still have to define the fiducial values for Xi and Ci. 

However, there is no astrophysical models for X and C. Therefore, we define the SNe la 
fiducial model {m^,, Xf'^, Cf'^) as being the best-fitting values obtained by maximizing, with 
respect to all Xi and Ci, the multivariate Gaussian distribution [30] 


G(C,C 


cmp J 


oS '^cmps 




TT 


l37V| 


-cmp I 


(A.3) 


where |...| denotes the determinant, is the complete inverse covariance matrix (with 2220 
rows and columns), N is the number of SNe la of the JLA sample (740) and 


C = ■■■,'mBM^^N+l, ■■■,X2N,C2N+1, , 


with {mBi} given by eq. (A.2).^® 

Finally, writing the SNe la variables as 


niBi = + 5mBi, Xi = Xf'^ + 6Xi and Ci 


Cf+ 5Ci 


(A.4) 


we create a SN la sample {m^., W,Cj} by randomly generating 2220 values from the normal 
distribution with variance Ccmp and zero mean corresponding to 

( 5 C = {SmBi, , JXtv+i, SX2N, SC2N+1, •••, ^Csat) . 


^®The covariance matrix Ccmp also depends on the intrinsic standard deviation of each SN la subsample, i.e., 
SDSS, SNLS3, low-z and HST (Hubble Space Telescope), respectively. In particular, 

we used original values calibrated in [30]. 
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Table 1. BAO data 


Reference 

z 

Dy{z)/rd 



c~* 

^BAO 




Rentier et ah* 

0.106 

0.336 

0.015-2 

0 

0 

0 

0 

0 

Padmanabhan et al. 

0.35 

8.88 

0 

34.60 

0 

0 

0 

0 


0.44 

11.550 

0 

0 

4.8116 

-2.4651 

1.0375 

0 

Kazin et al.^ 

0.60 

14.945 

0 

0 

-2.4651 

3.7697 

-1.5865 

0 


0.73 

16.932 

0 

0 

1.0375 

-1.5865 

3.6498 

0 

Ross et al.^ 

0.15 

4.466 

- 

- 

- 

- 

- 

- 


^ In Ref. [31] the authors provide r^/Dy{z). Note that it is the reciprocal of the provided 
in the other references. As this data is not correlated with the others, we can include it 
directly in Lbao [eq. (5.6)]. 

^ Refs. [33] and [34] provide Dy{z){r^^/vd)-, where = 148.6 and 148.69/i“^Mpc, respec¬ 
tively. We consider these values to build Lbao- 

A.2 H{z) and BAO sampling 

Differently from the SN la sampling, the BAO and H{z) mock catalogs can be directly 
generated, since we have theoretical models for the respective observables, i.e., Dy{z)/rd [or 
Td/Dy{z)\^ H{z) and H{z)rd/{1 + z), as displayed in eqs. (5.4), (5.5) and (2.7). 

Each catalog correspond to where i = 1,...,20, and the point {77°*’®}. 

The 21 redshift values are not altered in the sampling procedure and they are equal to the 
observed data z of table 2. We obtain 77°*’® writing it as 

and, then, we randomly generate the quantity 6Hi from a Gaussian distribution with mean 0 
and standard deviation at (last column of table 2), since 77°*°® follows a Gaussian distribution. 
Analogously, we generate the point 77°*’® using a zero mean Gaussian with standard deviation 
1721 and add the result to H{z 2 i)rd /(1 + Z 2 i)- The value of 77**'^(zi) correspond to the fiducial 
value, which is defined by q^'^{z), Hq'^ = 73.0kms“*Mpc“* and r^'* = 103.5/i“*Mpc. 

Applying the same methodology, we generate the first two points of the BAO set 
{Dy./rd}, where Dy{zj)/rd = D^{zj)/r^ + 6bj, the fiducial values are also defined by 
g'**^(z), 77 q'^ and and their respective inverse variances are presented in the first two rows 
of table 1. The three correlated BAO points, related to Kazin et al. [33] data, are resampled 
randomly generating the 3 values 6bj from a multivariate Gaussian distribution with means 
0 and covariance matrix provided in reference [33], see table 1. 

The last BAO point {j = 6) follows a different probability distribution, which is defined 
by the likelihood function Lfioss provided by Ross et al. [34] and centered in the observed 
value {Dyjrd)°^^ = 4.466. Thus, to produce a mock catalog from a fiducial model we perform 
the inverse transform sampling. First, we shift this function, L^jioss such that the new center 
is 71y*(zj)/r^'*. Then, we randomly generate a number u from a uniform distribution (0, 1 ) 
and, finally, we invert the equation 

pcx. 

u= da'L%^^^{a') (A.5) 

J Oil 


30- 






obtaining a{u). The lower bound a* corresponds to the lower bound described by the Lross 
function. 


Table 2. H{z) data 


Reference 

z 

H{z) 

a 

Reference 

z 

H{z) 

a 

Riess et al. ]36] 

0.0 

73.8 

2.4 


0.1 

69 

12 


0.18 

75 

4 


0.17 

83 

8 


0.20 

75 

5 


0.27 

77 

14 


0.35 

83 

14 


0.4 

95 

17 

Moresco et al. ]37] 

0.59 

104 

13 

Stern et al. ]35] 

0.48 

97 

60 

0.68 

92 

8 

0.88 

90 

40 


0.78 

105 

12 


0.9 

117 

23 


0.88 

125 

17 


1.3 

168 

17 


1.04 

154 

20 


1.43 

177 

18 






1.53 

140 

14 

Busca et al. ]38]^ 

- 

- 

- 


1.75 

202 

40 


^ The observable of this entry is related to H{z 2 i)rd /(l + ^ 2 i)) instead of sim¬ 
ply H(z) as the other data points. In this entry the redshift is 2:21 = 2.3, the 
observable value and its standard deviation are = 224kms“^Mpc“^ 
and (T 21 = 8kms“^Mpc“^, respectively. 
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